This tutorial provides a detailed description of the clustering analysis of streets of Camden. It guides through the process of data clearing and processing, clustering algorithm, cluster exploration and correlation with crime levels on the streets. The aim of the analysis is to determine which streets are woman-friendly and have a potential to make women feel safe while walking, and therefore the data used includes the width of the pavements, street lights, public toilets and urban greenery.
In the analysis, the following data was used:
data from Camden datastore:
First, let’s load some libraries we’ll be using
library(sf)
library(here)
library(tmap)
library(tidyverse)
library(leaflet)
library(leafgl)
library(mapdeck)
library(RColorBrewer)
library(dplyr)
library(stringr)
library(ggplot2)Let’s read in London Boroughs data, and make a Camden Outline layer, so we can clip other data to it.
# Reading in London boundaries data
LondonBoroughs <- st_read(here::here("data",
"statistical-gis-boundaries-london",
"ESRI",
"London_Borough_Excluding_MHW.shp"))
# here, we create a CamdenOutline to clip other data to it
CamdenOutline <- LondonBoroughs %>%
filter(., NAME=="Camden")We want all our data to be in a same projection - EPSG: 27700, so let’s make sure CamdenOutline is. Plot it to see if everything looks alright.
# check the projection
print(CamdenOutline)
# reproject Camden
CamdenOutlineProjected <- CamdenOutline %>%
st_transform(.,27700)# let's see CamdenOutline on a map
tmap_mode("view")
qtm(CamdenOutline)Everything looks alright so let’s proceed with the pavements width data. It was firstly processed in QGIS with the function dissolve, as in the original dataset streets are split into smaller bits and the analysis requires data to be on the street level. The file was then saved as geojson.
# Camden streets - pavement width
# reading in file
streetsGJSON <- st_read(here::here("data", "streets3.geojson"))
# check the projection
print(streetsGJSON)
# change the projection
streets <- streetsGJSON %>%
st_transform(.,27700)
print(streets)The pavements width data covers the whole London, so let’s clip it with CamdenOutline.
# get rid of the streets outside Camden
pavementsWidth <- streets[CamdenOutlineProjected,]Let’s see how it looks like on a map.
# see the result
qtm(pavementsWidth)We’re going to repeat the previous steps with street lights, trees and public toilets data.
# Camden street lights
streetLights <- st_read(here::here("data",
"Camden Street Lighting",
"geo_export_f77a2cc3-40d7-403c-84fe-64be970169bf.shp"))
# Camden public toilets
publicToiletsCSV <- read_csv(here::here("data",
"Public_Conveniences_In_Camden_Map.csv"))
publicToilets <- st_as_sf(publicToiletsCSV,
coords = c("Longitude","Latitude"),
crs = 4326)
# trees in Camden
treesCamdenCSV <- read_csv(here::here("data",
"Trees_In_Camden.csv"))The aim of this part is to a prepare one dataset for clustering. To do so, firstly, we need to clear and prepare our data for joining.
Let’s check for any missing values.
# Pavements Width Data
# check if there are any nulls/nas
pavementsWidth %>%
summarize_all(funs(sum(is.na(.))))## Simple feature collection with 1 feature and 8 fields
## geometry type: MULTILINESTRING
## dimension: XY
## bbox: xmin: 523973.4 ymin: 180966 xmax: 531554.6 ymax: 187481.4
## projected CRS: OSGB 1936 / British National Grid
## fid id DISTNAME ROADNUMBER CLASSIFICA foW caW toW
## 1 0 0 1 961 0 0 0 0
## geometry
## 1 MULTILINESTRING ((525845 18...
There are missing values in the road number column, but we’re not going to use that information so there’s no need deleting them. Here, we’re also creating a completeData datset, where we will store all our important variables. We’re creating it by taking pavementsWidth dataset and adding a new column - propFowTow, which is a proportion of footpath width to the total street width. Furthermore, we’re calculating the length of each street and store it in our dataset.
# add a new column to see a proportion of pavement the total street width
# and create a new dataframe to store all the data needed for the analysis
completeData <- transform(pavementsWidth, propFowTow = foW/toW)
# calculate the length of a street and add it to completeData dataset
completeData$streetLength <- st_length(completeData$geometry)
# check whether a new column is numeric
is.numeric(completeData$streetLength)
# transform it to numeric
completeData$streetLength <- as.numeric(completeData$streetLength)Let’s round the numbers, check for any missing values, and plot one of the variables to see if everything looks fine.
# rounding the columns
completeData <- completeData %>%
mutate_if(is.numeric, ~round(., 2))
# check for nans
sum(is.na(completeData$streetLength))# see how it looks like
tm_shape(completeData)+
tm_lines("propFowTow",
palette = "RdYlGn",
direction=-1)We’re going to add public toilets to our complete dataset by the name of the street they are located at. First, let’s check if all the public toilets have that information.
### Public Toilets Data
# check if there are null values in street name column
sum(is.na(publicToilets$Street))
# remove values where the street name is not known
publicToilets <- publicToilets %>%
drop_na(Street)Once it’s done, we can join it to our complete dataset. We only want to add columns Name and Street. Based on that, we’re creating a new 0-1 column to indicate whether there is a public toilet located by the street or not.
# add public Toilets to complete dataset
# select only columns I want to add
toiletsStreet <- publicToilets %>%
select(Name,Street)
# join
completeData <- left_join(completeData,toiletsStreet, by= c("DISTNAME"="Street"))
# add a new column 0-1 based on whether there is a toilet by the street or not
completeData$publicToilet <- ifelse(is.na(completeData$Name),0, 1)
# add a new column with public toilets per 10 m of the street
completeData$toilets10m <- (completeData$publicToilet*10)/completeData$streetLength# check if worked
head(completeData)## Simple feature collection with 6 features and 13 fields
## geometry type: MULTILINESTRING
## dimension: XY
## bbox: xmin: 525658 ymin: 181659 xmax: 530661 ymax: 186393
## projected CRS: OSGB 1936 / British National Grid
## fid id DISTNAME ROADNUMBER CLASSIFICA foW caW toW
## 1 73888 72914 Templewood Avenue <NA> Local Road 3.25 12.20 15.45
## 2 67276 66264 The Old Orchard <NA> Local Road 0.00 4.89 4.89
## 3 62098 61046 Little Albany Street <NA> Local Road 0.93 5.23 6.16
## 4 74037 73063 Branch Hill <NA> Minor Road 10.87 11.61 22.48
## 5 91125 90367 Red Lion Square <NA> Local Road 1.28 13.86 15.14
## 6 71402 70418 Ornan Road <NA> Local Road 4.76 7.86 12.62
## propFowTow streetLength Name geometry publicToilet
## 1 0.21 457.52 <NA> MULTILINESTRING ((525845 18... 0
## 2 0.00 71.86 <NA> MULTILINESTRING ((527514 18... 0
## 3 0.15 38.77 <NA> MULTILINESTRING ((528871 18... 0
## 4 0.48 267.53 <NA> MULTILINESTRING ((526017.2 ... 0
## 5 0.08 245.11 <NA> MULTILINESTRING ((530571 18... 0
## 6 0.38 252.50 <NA> MULTILINESTRING ((527184 18... 0
## toilets10m
## 1 0
## 2 0
## 3 0
## 4 0
## 5 0
## 6 0
Firstly, let’s check the projection, see how the data looks like and plot the street lights.
# check the projection
print(streetLights)
# see how the data looks like
head(streetLights)
# change the crs of streetLights
streetLights <- streetLights %>%
st_transform(.,27700)
print(streetLights)qtm(streetLights)